Use
mutateto add new variables or modify the existing ones.
For example, the pulse dataset has two pulse measurements, let’s say we are interested in average pulse and we want this information to be available as a separate variable, e.g. averagePulse, in the pulse tibble. Then we can do this with:
mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 female no yes high ran 96 176
4 1993_D Trav… 195 84 18 male no yes high sat 71 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88
6 1993_F Geor… 184 74 22 male no yes low ran 78 141
7 1993_G Cher… 162 57 20 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68
10 1993_J Troy 168 60 23 male no yes modera… ran 88 150
# … with 100 more rows, 2 more variables: year <dbl>, averagePulse <dbl>, and
# abbreviated variable name ¹exercise
By default the new column is added at the last position in the tibble.
AnswerDoes the pulse tibble now contain the variable
averagePulse?
No, if you want to keep the new variable averagePulse you’ll need to use assignment ‘<-’ to replace the original pulse tibble with the newly modified version:
pulse <- mutate(pulse, averagePulse = (pulse1+pulse2)/2)
Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]
Note that BMI definition states that weight and height must be in kilograms and metres respectively. In the pulse dataset weight is given in kilograms but height is in centimetres. We can now first create a new variable height_metre containing the height in metres and then calculate BMI:
pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 female no yes high ran 96 176
4 1993_D Trav… 195 84 18 male no yes high sat 71 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88
6 1993_F Geor… 184 74 22 male no yes low ran 78 141
7 1993_G Cher… 162 57 20 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68
10 1993_J Troy 168 60 23 male no yes modera… ran 88 150
# … with 100 more rows, 2 more variables: year <dbl>, height_metre <dbl>, and
# abbreviated variable name ¹exercise
pulse_bmi tibble has now the height in metre units, now we can calculate BMI:
pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2))
pulse_bmi
# A tibble: 110 × 15
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 female no yes high ran 96 176
4 1993_D Trav… 195 84 18 male no yes high sat 71 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88
6 1993_F Geor… 184 74 22 male no yes low ran 78 141
7 1993_G Cher… 162 57 20 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68
10 1993_J Troy 168 60 23 male no yes modera… ran 88 150
# … with 100 more rows, 3 more variables: year <dbl>, height_metre <dbl>, BMI <dbl>,
# and abbreviated variable name ¹exercise
Alternatively, you may skip the creation of height_metre and calculate BMI directly from the pulse tibble:
pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2))
pulse_bmi
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 female no yes high ran 96 176
4 1993_D Trav… 195 84 18 male no yes high sat 71 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88
6 1993_F Geor… 184 74 22 male no yes low ran 78 141
7 1993_G Cher… 162 57 20 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68
10 1993_J Troy 168 60 23 male no yes modera… ran 88 150
# … with 100 more rows, 2 more variables: year <dbl>, BMI <dbl>, and abbreviated
# variable name ¹exercise
In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:
mutate(pulse, age=age*365)
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 6570 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 6935 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 6570 female no yes high ran 96 176
4 1993_D Trav… 195 84 6570 male no yes high sat 71 73
5 1993_E Lauri 173 64 6570 female no yes low sat 90 88
6 1993_F Geor… 184 74 8030 male no yes low ran 78 141
7 1993_G Cher… 162 57 7300 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 6570 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 6935 female no yes high sat 68 68
10 1993_J Troy 168 60 8395 male no yes modera… ran 88 150
# … with 100 more rows, 1 more variable: year <dbl>, and abbreviated variable name
# ¹exercise
here we keep the variable age but change its unit from year to days.
Another example would be to convert the height and weight from metric to imperial units with (1 kg = 2.2 lbs) and (1 inch = 2.54 cm) :
mutate(pulse, height=height/2.54, weight=weight*2.2)
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 68.1 125. 18 female no yes modera… sat 86 88
2 1993_B Mela… 70.5 128. 19 female no yes modera… ran 82 150
3 1993_C Cons… 65.7 136. 18 female no yes high ran 96 176
4 1993_D Trav… 76.8 185. 18 male no yes high sat 71 73
5 1993_E Lauri 68.1 141. 18 female no yes low sat 90 88
6 1993_F Geor… 72.4 163. 22 male no yes low ran 78 141
7 1993_G Cher… 63.8 125. 20 female no yes modera… sat 68 72
8 1993_H Fran… 66.5 121 18 female no yes modera… sat 71 77
9 1993_I Sonja 64.6 123. 19 female no yes high sat 68 68
10 1993_J Troy 66.1 132 23 male no yes modera… ran 88 150
# … with 100 more rows, 1 more variable: year <dbl>, and abbreviated variable name
# ¹exercise
In the previous examples we were updating or adding variables with simple arithmetic using mutate and all values were considered under the same calculation. However, there are situation where we would like to treat values conditionally, this is possible with the helper function if_else.
Examples:
Add a new variable max_pulse reporting the higher pulse rate of the two measurements pulse1 and pulse2 for each observation:
mutate(pulse, max_pulse=if_else(pulse1 < pulse2, pulse2, pulse1))
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 female no yes high ran 96 176
4 1993_D Trav… 195 84 18 male no yes high sat 71 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88
6 1993_F Geor… 184 74 22 male no yes low ran 78 141
7 1993_G Cher… 162 57 20 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68
10 1993_J Troy 168 60 23 male no yes modera… ran 88 150
# … with 100 more rows, 2 more variables: year <dbl>, max_pulse <dbl>, and
# abbreviated variable name ¹exercise
Add a logical variable adult which is true if age>=18 and false otherwise:
mutate(pulse, adult=if_else(age >= 18 , TRUE, FALSE))
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 female no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 female no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 female no yes high ran 96 176
4 1993_D Trav… 195 84 18 male no yes high sat 71 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88
6 1993_F Geor… 184 74 22 male no yes low ran 78 141
7 1993_G Cher… 162 57 20 female no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 female no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68
10 1993_J Troy 168 60 23 male no yes modera… ran 88 150
# … with 100 more rows, 2 more variables: year <dbl>, adult <lgl>, and abbreviated
# variable name ¹exercise
Convert gender values, female to f and male to m:
mutate(pulse, gender = if_else(gender == 'female' , 'f', 'm'))
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exerc…¹ ran pulse1 pulse2
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1993_A Bonn… 173 57 18 f no yes modera… sat 86 88
2 1993_B Mela… 179 58 19 f no yes modera… ran 82 150
3 1993_C Cons… 167 62 18 f no yes high ran 96 176
4 1993_D Trav… 195 84 18 m no yes high sat 71 73
5 1993_E Lauri 173 64 18 f no yes low sat 90 88
6 1993_F Geor… 184 74 22 m no yes low ran 78 141
7 1993_G Cher… 162 57 20 f no yes modera… sat 68 72
8 1993_H Fran… 169 55 18 f no yes modera… sat 71 77
9 1993_I Sonja 164 56 19 f no yes high sat 68 68
10 1993_J Troy 168 60 23 m no yes modera… ran 88 150
# … with 100 more rows, 1 more variable: year <dbl>, and abbreviated variable name
# ¹exercise
Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC